Solving Prediction Games with Parallel Batch Gradient Descent
نویسندگان
چکیده
Learning problems in which an adversary can perturb instances at application time can be modeled as games with datadependent cost functions. In an equilibrium point, the learner’s model parameters are the optimal reaction to the data generator’s perturbation, and vice versa. Finding an equilibrium point requires the solution of a difficult optimization problem for which both, the learner’s model parameters and the possible perturbations are free parameters. We study a perturbation model and derive optimization procedures that use a single iteration of batch-parallel gradient descent and a subsequent aggregation step, thus allowing for parallelization with minimal synchronization overhead.
منابع مشابه
One Network to Solve Them All — Solving Linear Inverse Problems using Deep Projection Models
We now describe the architecture of the networks used in the paper. We use exponential linear unit (elu) [1] as activation function. We also use virtual batch normalization [6], where the reference batch size bref is equal to the batch size used for stochastic gradient descent. We weight the reference batch with bref bref+1 . We define some shorthands for the basic components used in the networks.
متن کاملParallel Dither and Dropout for Regularising Deep Neural Networks
Effective regularisation during training can mean the difference between success and failure for deep neural networks. Recently, dither has been suggested as alternative to dropout for regularisation during batch-averaged stochastic gradient descent (SGD). In this article, we show that these methods fail without batch averaging and we introduce a new, parallel regularisation method that may be ...
متن کاملParallel Collaborative Filtering for Streaming Data
We present a distributed stochastic gradient descent algorithm for performing low-rank matrix factorization on streaming data. Low-rank matrix factorization is often used as a technique for collaborative filtering. As opposed to recent algorithms that perform matrix factorization in parallel on a batch of training examples [4], our algorithm operates on a stream of incoming examples. We experim...
متن کاملAccelerated Mini-Batch Stochastic Dual Coordinate Ascent
Stochastic dual coordinate ascent (SDCA) is an effective technique for solving regularized loss minimization problems in machine learning. This paper considers an extension of SDCA under the minibatch setting that is often used in practice. Our main contribution is to introduce an accelerated minibatch version of SDCA and prove a fast convergence rate for this method. We discuss an implementati...
متن کاملStochastic Proximal Gradient Descent with Acceleration Techniques
Proximal gradient descent (PGD) and stochastic proximal gradient descent (SPGD) are popular methods for solving regularized risk minimization problems in machine learning and statistics. In this paper, we propose and analyze an accelerated variant of these methods in the mini-batch setting. This method incorporates two acceleration techniques: one is Nesterov’s acceleration method, and the othe...
متن کامل